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Abstract 



This article considers constrained t\ minimization methods for the recovery of 
high dimensional sparse signals in three settings: noiseless, bounded error and Gaus- 
sian noise. A unified and elementary treatment is given in these noise settings for 
two l\ minimization methods: the Dantzig selector and l\ minimization with an £2 
constraint. The results of this paper improve the existing results in the literature 
by weakening the conditions and tightening the error bounds. The improvement on 
the conditions shows that signals with larger support can be recovered accurately. 
This paper also establishes connections between restricted isometry property and the 
mutual incoherence property. Some results of Candes, Romberg and Tao (2006) and 
Donoho, Elad, and Temlyakov (2006) are extended. 

Keywords: Dantzig selector, i\ minimization, Lasso, overcomplete representation, sparse 
recovery, sparsity. 

1 Introduction 

The problem of recovering a high-dimensional sparse signal based on a small number of 
measurements, possibly corrupted by noise, has attracted much recent attention. This 
problem arises in many different settings, including model selection in linear regression, 
constructive approximation, inverse problems, and compressive sensing. 
Suppose we have n observations of the form 
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y = F[3 + Z 




where the matrix F G M. nxp with n <C p is given and z G M. n is a vector of measurement 
errors. The goal is to reconstruct the unknown vector (3 G W. Depending on settings, 
the error vector z can either be zero (in the noiseless case), bounded, or Gaussian where 
z ~ N(0,a 2 I n ). It is now well understood that l\ minimization provides an effective way 
for reconstructing a sparse signal in all three settings. 

A special case of particular interest is when no noise is present in (11.10 and y = Ff3. 
This is an under determined system of linear equations with more variables than the number 
of equations. It is clear that the problem is ill-posed and there are generally infinite many 
solutions. However, in many applications the vector (3 is known to be sparse or nearly 
sparse in the sense that it contains only a small number of nonzero entries. This sparsity 
assumption fundamentally changes the problem, making unique solution possible. Indeed 
in many cases the unique sparse solution can be found exactly through l\ minimization: 

(P) min||7||i subject to F r y = y. (1.2) 

This l\ minimization problem has been studied, for example, in Fuchs [UJ, Candes and 
Tao [4| and Donoho -0] . Understanding the noiseless case is not only of significant interest 
on its own right, it also provides deep insight into the problem of reconstructing sparse 
signals in the noisy case. See, for example, Candes and Tao [HE] and Donoho [HI [7]. 

When noise is present, there are two well known i\ minimization methods. One is l\ 
minimization under the £2 constraint on the residuals: 

(Pi) min 1 1 7 1 1 1 subject to \\y — Fj\\ 2 < e. (1.3) 

Writing in terms of the Lagrangian function of (Pi), this is closely related to finding the 
solution to the t\ regularized least squares: 

min{||y-P 7 ||2 + p|| 7 || 1 } . (1.4) 
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The latter is often called the Lasso in the statistics literature (Tibshirani [13]). Tropp [2] 
gave a detailed treatment of the t\ regularized least squares problem. 

Another method, called the Dantzig selector, is recently proposed by Candes and Tao 
[5]. The Dantzig selector solves the sparse recovery problem through ^-minimization with 
a constraint on the correlation between the residuals and the column vectors of F: 

(DS) min||7||i subject to \\F T (y - P7) H*, < A. (1.5) 
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Candes and Tao [5] showed that the Dantzig selector can be computed by solving a linear 
program and it mimics the performance of an oracle procedure up to a logarithmic factor 
logp. 
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It is clear that regularity conditions are needed in order for these problems to be well 
behaved. Over the last few years, many interesting results for recovering sparse signals have 
been obtained in the framework of the Restricted Isometry Property (RIP). In their seminal 
work [U [5] , Candes and Tao considered sparse recovery problems in the RIP framework. 
They provided beautiful solutions to the problem under some conditions on the restricted 
isometry constant and restricted orthogonality constant (defined in Section [2]). Several 
different conditions have been imposed in various settings. 

In this paper, we consider l\ minimization methods for the sparse recovery problem 
in three cases: noiseless, bounded error and Gaussian noise. Both the Dantzig selector 
(DS) and i\ minimization under the I2 constraint (Pi) are considered. We give a unified 
and elementary treatment for the two methods under the three noise settings. Our results 
improve on the existing results in [21 [3j HJ E] by weakening the conditions and tightening 
the error bounds. In all cases we solve the problems under the weaker condition 

<5l.5fc + &k,l.hk < 1 

where k is the sparsity index and 5 and 6 are respectively the restricted isometry constant 
and restricted orthogonality constant defined in Section [2j The improvement on the condi- 
tion shows that signals with larger support can be recovered. Although our main interest 
is on recovering sparse signals, we state the results in the general setting of reconstructing 
an arbitrary signal. 

Another widely used condition for sparse recovery is the so called Mutual Incoherence 
Property (MIP) which requires the pairwise correlations among the column vectors of F to 
be small. See [8j [91 [TTJ [TJl [TJ] . We establish connections between the concepts of RIP and 
MIP. As an application, we present an improvement to a recent result of Donoho, Elad, 
and Temlyakov [8]. 

The paper is organized as follows. In Section [2} after basic notation and definitions are 
reviewed, two elementary inequalities, which allow us to make finer analysis of the sparse 
recovery problem, are introduced. We begin the analysis of l\ minimization methods for 
sparse recovery by considering the exact recovery in the noiseless case in Section [3j Our 
result improves the main result in Candes and Tao jl] by using weaker conditions and 
providing tighter error bounds. The analysis of the noiseless case provides insight to the 
case when the observations are contaminated by noise. We then consider the case of 
bounded error in Section HI The connections between the RIP and MIP are also explored. 
The case of Gaussian noise is treated in Section [5j The Appendix contains the proofs of 
some technical results. 
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2 Preliminaries 



In this section we first introduce basic notation and definitions, and then develop some 
technical inequalities which will be used in proving our main results. 

Let p G N. Let v — (t>i, ii 2 , • • • , v p) £ ^ p be a vector. The support of v is the subset of 
{1, 2, • • • ,p} defined by 



For an integer k G N, a vector v is said to be k-sparse if |supp(w)| < k. For a given vector v 
we shall denote by v ma x(fc) the vector v with all but the fc-largest entries (in absolute value) 
set to zero and define f_ max (fc) — v — f m ax(fc)> the vector v with the fc-largest entries (in 
absolute value) set to zero. We shall use the standard notation \\v\\ g to denote the £ 9 -norm 
of the vector v. 

Let the matrix F G M riXp and 1 < k < p, the k-restricted isometry constant 8k of F is 
defined to be the smallest constant such that 



for every vector c which is £;-sparse. If k + k' < p, we can define another quantity, the 
k, k' -restricted orthogonality constant 9k,k'i as the smallest number that satisfies 



for all c and d such that c and d are fc-sparse and /c'-sparse respectively, and have disjoint 
supports. Candes and Tao [1] showed that the constants 5k and Ok,k' are related by the 
following inequalities, 



supp(w) = {i : Vi ^ 0}. 




(2.1) 





0k,k' < h+k' < ^fc.fc' + max((5 fc , 8k'). 



Another useful property is as follows. 



Proposition 2.1 If k + J2i=i ^ — P> ^ en 





Proof of Proposition 12.11 Let c be fc-sparse and d be (J2i=i ^i)-sparse. Suppose their 
supports are disjoint. Decompose d as 



C = C[+ 4 H h C; 



4 



such that c[ is /c^-sparse for i = 1, • • • , j and supp(c')j fl supp(c')j = for i ^ j. We have 



|<Fc,F C ')| = |(F C ,^F^)|<^|(Fc,F^)| 



i=l 



8=1 



I 



< ^Ok^WchlWih = \\c\\>. 

8=1 



\ j=l \ 8=1 



DM 



/ 



\ 8=1 



This yields fc E i =ifc . < a/ELi^- Since ^ < <y fc+v , we also have kjT i. =iki < v/EU€+fc 



Remark: Different conditions on 5 and 9 have been used in the literature. For example, 
Candes and Tao [5] imposes S 2 k + 0k,2k < 1 an d Candes [2] uses <5 2 fc < y/2 — 1. A direct 
consequence of Proposition 12.11 is that 5 2k < V% — 1 is in fact a strictly stronger condition 
than 5 2 k + ®k,2k < 1 since Proposition 12.11 yields 9 k ,2k < \j $ 2 k + $2k = V^^fc which means 
that 5 2 k < V% - 1 implies 5 2k + 9 k , 2k < 1- 

We now introduce two useful elementary inequalities. These inequalities allow us to 
perform finer estimation on £ 1; l 2 norms. 

Proposition 2.2 Let w be a positive integer. For any descending chain of real numbers 

a-i > a 2 > ■ ■ ■ > a w > a w+1 > ■ ■ > a 2w > 0, 

we have 

2 ~ 2 T T 2~ <- «i + 02 + • • • + Ct w + Ct w +1 + • • • + Ct2w 



l w+l ~ Uj w+2 1 1 ~zw — 1\fw 

Proof of Proposition 12.21 Since > aj for % < j, we have 
(ai + a 2 + V a 2w ) 2 = a\ + a\ + ■ ■ ■ a\ w + 2 2_. a i a 



> al + a 2 2 + ---a 2 2w + 2j2' 



i<j 



= a\ + 3a 2 . + • • • + (2w - l)a 2 w + 

+ (2w + l)a 2 w+l + ... + (Aw- 3)aL_x + (4w - 1)< 
= (a 2 + (Aw - l)a 2 2w ) + (3a 2 + (Aw - 3)a^-i) + ■ • ■ 

+ ((2«;-l)a 2 y + (2«; + l)a 2 +1 ) 
> Awa 2 2w + Awa 2 2w ^ + ■ ■ ■ Awa 2 w+1 . | 
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Proposition 12.21 can be used to improve the main result in Candes and Tao [5) by 
weakening the condition to #i.75& + 0k,i.75k < 1- However, the next proposition, which we 
will use in proving our main results, is more powerful for our applications. 

Proposition 2.3 Let w be a positive integer. Then any descending chain of real numbers 

a\ > a 2 > • ■ ■ > a w > a w+1 > ■ ■ > a 3w > 

satisfies 

2 ~~ 2 T T 2~ <- a l + " ' ' + a w + 2(0^4-1 + ■ ■ ■ + 02 W ) + Cb2 w +l + ■ • • + a^w 



l w+l T u w+2 ~ ~ u 3w — , , 

The proof of Proposition I2.3I is given in the Appendix. 



3 Signal Recovery in the Noiseless Case 

As mentioned in the introduction we shall consider recovery of sparse signals in three cases: 
noiseless, bounded error, and Gaussian noise. We begin in this section by considering the 
problem of exact recovery of sparse signals when no noise is present. This is an interesting 
problem by itself and has been considered in a number of papers. See, for example, Fuchs 
[TT] . Donoho [6], and Candes and Tao [1]. More importantly, the solutions to this "clean" 
problem shed light on the noisy case. Our result improves the main result given in Candes 
and Tao pE]. The improvement is obtained by using the technical inequalities we developed 
in previous section. Although the focus is on recovering sparse signals, our results are 
stated in the general setting of reconstructing an arbitrary signal. 

Let F e M. nxp with n < p and suppose we are given F and y where y = F{3 for some 
unknown vector (3. The goal is to recover /3 exactly when it is sparse. Candes and Tao 
showed that a sparse solution can be obtained by l\ minimization which is then solved via 
linear programming. 

Theorem 3.1 (Candes and Tao |4j) Let F 6 M nxp . Suppose k > 1 satisfies 

s k + o k)k + e K2k < 1. (3.1) 

Let f3 be a k-sparse vector and y := F/3. Then (3 is the unique minimizer to the problem 

(P) min ||7||i subject to F7 = y. 
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We shall show that this result can be further improved by a transparent argument. A direct 
application of Proposition 12.31 yields the following result which improves Theorem 13.11 by 
weakening the condition from 

b~k + &k,k + &k,2k < 1) 

to 

<5l.5fc + 0k,1.5k < 1- 

Theorem 3.2 Let F e W nxp . Suppose k > 1 satisfies 

Sl.5k + 9k,l.5k < 1 

and y = F/3. Then the minimizer (3 to the problem 

(P) min H'ylli subject to F7 = y 

obeys 

-/3|| 2 <C A;-3 11^^)11! 

where C = 1 2 ^f (l ~ 5 fl l5fc) . 

In particular, if (3 is a k-sparse vector, then (3 = (3, i.e., the i\ minimization recovers (3 
exactly. 

Proof of Theorem I3.2t The proof relies on Proposition I2.3I and makes use of the ideas 
from [2l IH [5]. m this proof, we shall also identify a vector v = (v 1, v 2 , ■ ■ ■ , v p ) G R p as a 
function v : {1, 2, • ■ ■ ,p} — > R by assigning v(i) = V{. 

Let (3 be a solution to the l\ minimization problem (P). Let T = {ni,n 2 , ■ ■ ■ ,n k } C 
{1, 2, • • • ,p} be the support of j3 ma , x (k) and let h — (3 — (3. Write 

{1,2, •• • ,p}\ {ni,n 2 , • • • ,n fc } = {n k +i,n k+2 , ■ ■ ■ ,n p } 

such that \h{n k+ i)\ > |/i(nfc +2 )| > \h{nk+z)\ > • ■ • ■ Fix an integer t > and let 

2~i = {^fc+l, n k+2, ■ • ■ , ^2 = {n( t+ l)fc + l, ?T-(t+l)fc+2, ■ • • > ™(2t+l)fc}) 

For a subset P C {1,2, • • • ,m}, we use Ie to denote the characteristic function of E, 



1 iijeE, 
if j ^ P. 
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For each i, let hi = hl^- Then h is decomposed to h = ho + hi + hi + ■ ■ ■ . Note that 
Tj's are pairwise disjoint, supp(/ij) C T$, and \T \ = k, |Tj| = tk for i > 0. Without loss of 
generality, we assume k is divisible by 4. 

For each i > 1, we divide /ij into two halves in the following manner 

hi = hn + ft, i2 with hn = hJ Til , and h i2 = hiI Tl2 , 

where Tu is the first half of Tj, i.e., 

Tn = {n((j_i) t+ i)fe + i,n((i_i) t+ i)fe +2 , • • • ,n ({i _ 1)t+1)fe+ |}, 

and T i2 = 2* \ T a . 

We shall treat hi as a sum of four functions and divide T\ into 4 equal parts 7\ = 
T u U T 12 U T 13 U T 14 with 

?ii = {n k+1 ,n k+2 , ■ ■ ■ ,n k+t k}, T u = {n k+t k +1 , ■ ■ ■ ,n k+t k}, 

T 13 = {n k+t k +1 , ■ ■ ■ ,n k+t sk} and T u = {n k+t 3k +1 , ■ ■ ■ ,n k+tk }. 

4 

We then define hy L for 1 < i < 4 by hu{j) = hilx^- It is clear that hi = hu. 

i=i 

Note that 

J2\\hi\\i < ||^o||i + 2||/?_ max(fe) ||i. (3.2) 

i>l 

In fact, since \\(3\\i > \\P\\i, we have 

\\/3\\i > \0\\i = + = ||/3 max (fe) + Mi + \\h-h + f3. m&x{k) \\i 

> ||/3 maX (fc)||l - \\h ||l + ^ ll^lll ~ 11/3- max(fc) ||l- 

i>l 

Since ||/3||i = ||/3 max (/t)||i + ||/3-max(fc)||i, this yields J2i>i HMi < HMi + 2||/3_ max ( fc )||i. 
The following claim follows from our Proposition 12.31 
Claim 

l|, | t II I V^m. II ^52i>l\\ h i\U ^ \\hoh . 2 ll/3-max(fc)||l , Q Q , 

Whs + huh + £ INh < < -jf + Wk ■ (3-3) 

In fact, from Proposition 12.31 and the fact that ||/in||i > ||^i2||i > Il^i3||i > H^-uHi, we have 
HMi + 2||/i 13 ||i + HMi < ^(211/iullx + 2||Mi + II ^is II i + HMi). 
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It then follows from Proposition 12.31 that 

IIM1 + 2IIM1 + IIMi 



\\h13 + h 14 \\2 < 



9 / m 

2 



< 2 _2||Mi + 2||/n 2 || x + ||/i 13 ||i + ||Mi 

3 o / tk 



9 / tK 

V 2 

< 2||Mi + 2||M i + ||Mi + HMIi 



2Vtfc 

Proposition 12.31 also yields 

11, I, ^ ll^is + Mi + 2IIM1 + IIMi 

2Vtk 

and 

. ||%-l)2||l + 2||/l il ||l+ ||Ml 

* II 2 < 



for any i > 2. Therefore, 

||/il3 + /il4||2 + ^H^Il2 < 



2VtA; 



2||Mi + 2||Mi + IIMi + ll>i 



14 1 



i>2 



2Vtk 

||/ll3 + Ml + 2 IIMl + \\h22Wl 



< 



2Vtk 

| HMi + 2||Mi + IIMi + 

2\ftk 

2||/i 1 ||i+2||/i 2 || 1 + 2||/i 3 ||i + -- 



2\/tk 

E,>i Nli 



tk 

b yf3) ||Mi + 2l|/3- ma * W ||i < INb , 211/3,^)11! 

tk ~ Vt y/tk 
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In the rest of our proof we write hu + ft, 12 = h[. Note that Fh = F/3 — Ff3 = 0. So 

= \{Fh,F(h + h'^l 

I (F(h + h[), F(h + h\)) + (F(h 13 + h u ), F(h + h[)) + ^(F/ii, F(/i Q + h[))\ 

i>2 

J2.ll2.2li 



> 



(1 - S ( i t+l)k )\\h + h\\\ 2 2 - 9i tk a t+l)k \\h 13 + h 1A \\ 2 \\ho + h\\\ 2 



-^ d tk,{lt+i)k\\ h ih\\hQ + h' x \\ 2 

i>2 

> \\h + K\\ 2 ui - 5 {¥+1)k )\\h + h[\\ 2 - e tk;C _ t+1)k (\\h 13 + h u \\ 2 + INh) 

^ " i>2 

D / ||/i || 2 2||/L max( , 

> ll^o + ^llhl (-1 - d^ t+1)k )\\tio + h x \\ 2 - U tk:{ i t+1)k —j= ^fe,(i t +i)fe 

. I,, . ,/ 1, J A , ^fc,(±t+i)fc\ .. , 2||/3_ max(fe) ||i 

> ll^o + ^1 II 2 < I 1 - 5 ( i t+ i )fc ^= I ||/i + Ma - ^fc,(i f+ i)fc ^= 

Take t = 1. Then 

II/10 + MII2 < 1 2 ^ L5fc max(fc) ||i 

1 — 0l.5fe — Uk,1.5k 

It then follows from (13. 3p that 

12 - ||/*> + K\\l + \\h l3 + MI2 + E H^Hi ^ ^ + ''ilia + dl^ 13 + Mb + E ||/*i|| 2 ) s 

i>2 i>2 
2 ^ o ( ^(1 ~~ ^1.5fc) ^ 



< 2(||/i + Ma + 2fc-5|| / 5_ max(fc) || 1 ) 2 < 2 \ if^k-2\\p_ max{k) \\ 

V 1 — ^1.5fc — ^k,1.5k 

Remarks. 

1. Candes and Tao [5] considers the Gaussian noise case. A special case with noise 
level cr = of Theorem 1.1 in that paper improves Theorem 13.11 by weakening the 
condition from 5 k + 9 kyk + 8 k)2k < 1 to 5 2k + 8 k)2k < 1. 

2. This theorem improves the results in [HE]. The condition 5\^ k + 9 kj i.s k < 1 is weaker 
than 5 k + 9 k)k + 9 kj2k < 1 and 5 2k + 9 k)2k < 1. 

3. Note that the condition 5i, 75k < V2 — 1 implies <5i.5fc + 9 k: i, 5k < 1. This is due to the 
fact 5i. 5fc + 9 k ,x.5k < f>x.5k + V ^ 2 . 7 5fc + S i.75k - (V2 + 75fc by Proposition ED The 
condition ^1.5^ + <5 2 .5A: < 1, which involves only 5, can also be used. 

4. The quantity t in the proof can be any number such that tk e N. As pointed out in 
[HE], other values of i may be used for obtaining some interesting results. 
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4 Recovery of Sparse Signals in Bounded Error 

We now turn to the case of bounded error. The results obtained in this setting have direct 
implication for the case of Gaussian noise which will be discussed in Section [51 
Let F G R nxp and let 

y = Fj3 + z 

where the noise z is bounded, i.e., z G B for some bounded set B. In this case the noise z 
can either be stochastic or deterministic. The t\ minimization approach is to estimate (3 
by the minimizer (3 of 

min 1 1 0^ j 1 1 subject to y — F7 G B. 

We shall specifically consider two cases: B = {z : HF^Hoo < A} and B = {z : ||z|| 2 < e}. 
Our results improve the results in Candes and Tao [H [5] and Donoho, Elad and Temlyakov 

H- 

We shall first consider 

y = F(3 + z where z satisfies H-F^Hoo < A. 
Let f3 be the solution to the (DS) problem, i.e., (3 is obtained by solving 

min II7II1 subject to \\F (y — F7) < X. (4.1) 

The Dantzig selector (3 has the following property. 

Theorem 4.1 Suppose [3 G W and y = F(3 + z with z satisfying \\F T z^^ < A. If 

Si.sk + 0k,i.5k < 1, (4.2) 

then the solution (3 to ft4-l\ ) obeys 

\0 - P\\ 2 < C t kh + C 2 k~* \\(3- max(k) ||i (4.3) 



2^ „„j n — 2%/2(i-g 1 , 6fc ) 

5kSk,1.5k ' 

In particular, if (3 is a k-sparse vector, then \\(3 — (3\\2 < Cik^X. 



with Ci = : x 2 ^ 3 fl , and C2 - 1 — r 



Proof of Theorem 14.11 . We shall use the same notation as in the proof of Theorem 13.21 
Since \\(3\\i > ||/3||i, letting h = (3 — (3 and following essentially the same steps as in the 
first part of the proof of Theorem 13.21 we get 

\{Fh,F{h + h[))\ > \\h + Kh j (l - <fi.5* - ^,i.5fc) \\ho + KW2 - g fc ,i-5fc "^ (fe) j • 
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If \\ho + = 0, then ho = and h\ = 0. The latter forces that hj = for every 
j > 1, and we have /3 — (3 = 0. Otherwise 

,,, , ,,„ . KF^F^o + Zii))! 20 M . 5fc ||/?_ max(fc) ||i 
|| ft o + ft i 1 1 2 S \~7Tr TTTi — 



To finish the proof, we observe the following. 



1. KF/^FfTio + Zii))! < Vh5k2X\\h + h'^. 

In fact, let Ft ut 10 uTh be the n x (1.5 A;) submatrix obtained by extracting the columns 
of F according to the indices in T U T w U Tn, as in [5]. Then 

\(Fh,F{ho + h[))\ = \{{FP-y) + z (ho + h[)}\ 

= I (f? ut 10 ut u ( W - y) + *) , ^0 + K) l 
< \\Ft ut 10 utu (W ~v)+z) hWK + K\\ 2 



< VTM2X\\h + h\\ 



2- 



2. - Ph < V2(\\ho + ^|la + 2|l/9 -^" 1 ). 
In fact, 



11/3 - /3II' = NI2 = ll^o + 111 + \\hi 3 + h 1A \\l + \\h 

i>2 

Who + /illla + (11*113 + huh + J2 Whih) 2 



2 
i\\2 



< 



i>2 



by <° |K + ,i|| ?+ (| Wb + fa^) 2 

< 2 ('ii/, +tf 1 ii 2 + 2|l/j — wlli V. 

We get the result by combining 1 and 2. This completes the proof. | 

We now turn to the second case where the noise z is bounded in ^2-norm. Let F e IR nxp 
with n < p. The problem is to recover the sparse signal (3 G W from 

y = Fj3 + z 

where the noise satisfies ||z||2 < e. We shall again consider constrained i\ minimization: 

min 1 1 ^ 1 1 1 subject to \y — Fjh ^ V- 
By using a similar argument, we have the following result. 
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Theorem 4.2 Let F e M nxp . Suppose (3 & MP is a k-sparse vector and y = F(3 + z with 
INb<e. If 

b~i.sk + 0k,i.5k < 1, (4.4) 
i/ien /or any f] > e, the minimizer j3 to the problem 

min||7||i subject to \\y — F^\\ 2 < r/ 

obeys 



\\P-Ph<C(ri + e) (4.5) 



with C 



1 — <5i.5fc— 0fe,i.5, 



Proof of Theorem 14.21 . Notice that the condition 77 > e implies that ||/3||i < ||/3||i, so 
we can use the first part of the proof of Theorem 13.21 The notation used here is the same 
as that in the proof of Theorem 13.21 
First, we have 

IMIi>E 



H\\l, 



11^0 + ^-1 1 1 2 (l ~ $1.5k ~ 6k,1.5k) 

Note that \\Fh\\ 2 = \\F{/3 - $)\\ 2 < \\F/3 - y\\ 2 + ||F/3 - y\\ 2 < V + e. 
So 

11/3 -P\\ 2 < >/2||/io + /ii||2 

< ^_ ||F/i|| a ||F(/>o + /ii)|| a 



11^0 + h\_ ||2(1 — <5l.5fc — Qk,l.5k) 

< V2 + + 6i.5k)\\h + Kh 



< 



\\ho + /i 1 1 1 2 ( 1 — 8l.5k — #fc,L5fc) 
.5k) 



1 — <h.5fc — 8, 



k,1.5k 



Remarks: 



1. Candes, Romberg and Tao [3] showed that, if 5^ + 3^4^ < 2, then 

V3 - 3d 4fc - VI + f)3fc 
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(The 7] was set to be e in [3|.) Now suppose 63k + "&&Ak < 2. This implies 5^ + 6^ < 1 
which yields 5 2A k + @i.6k,2Ak < 1, since 5 2Ak < <$3fc and 6i. 6k ,2Ak < #4fc- It then follows 
from Theorem 14.21 that, with 77 = e, 

li4-<?n,< 1 2 f (1+ ^* ) « 

for all fc'-sparse vector (3 where k' = 1.6k. Therefore Theorem 14.21 improves the above 
result in Candes, Romberg and Tao [3] by enlarging the support of (3 by 60%. 

2. Similar to Theorems 13.21 and 14.11 we can have the estimation without assuming that 
(3 is fc-sparse. In the general case, we have 



2y2~ flfc i i.5fc(l — <?i.5fc ) 

0\.5k — @k,1.5k 



| 2 < C( V + e) + - 5fcV \ ™' k-m3- max(fc) Ik- 



Connections between RIP and MIP 

In addition to the restricted isometry property (RIP), another commonly used condition 
in the sparse recovery literature is the so-called mutual incoherence property (MIP). The 
mutual incoherence property of F requires that the coherence bound 

M= max I {hfi) I (4.6) 
i<t,j<pm 

be small, where fx, f 2 , ■ ■ ■ , f p are the columns of F (fi's are also assumed to be of length 1 
in ^2-norm). Many interesting results on sparse recovery have been obtained by imposing 
conditions on the coherence bound M and the sparsity k, see [HI El EH H21 E] . For example, 
a recent paper, Donoho, Elad, and Temlyakov [8], proved that if j3 G W is a /c-sparse vector 
and y = F(3 + z with ||z|| 2 < e, then for any f] > e, the minimizer (3 to the problem 

min H'yll 1 subject to ||y — F7H2 < n 

satisfies 

0-/3\\ 2 <C( V + e). 

with C = , 1 provided k < 

We shall now establish some connections between the RIP and MIP and show that the 
result of Donoho, Elad, and Temlyakov [8] can be improved under the RIP framework, by 
using Theorem 14.21 

The following is a simple result that gives RIP constants from MIP. 
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Proposition 4.1 Let M be the coherence bound for F. Then 

5 k <{k- 1)M, and 6 Kk , < Vkk'M. (4.7) 

Proof of Proposition 14.11 . Let c be a fc-sparse vector. Without loss of generality, we 
assume that supp(c) = {1, 2, • • • , k}. A direct calculation shows that 

k 

\\ F 4l = Yl f^ CiC i = INI* + Y h)°i c r 

i,j=l ^<i,j<k,i^j 

Now let us bound the second term. Note that 

| Y (fi'fi) C i C i\ ^ M Y l QC J'l 

k 

< M(A;-l)^|c i | 2 = M(A;-l)||c||2. 

i=l 

These give us 

(1 - (k - l)M)\\c\\l < \\Fc\\l < (1 + (k - l)M)\\c\g 

and hence 

5 k <(k- 1)M. 

For the second inequality, we notice that M = 9i t \. It then follows from Proposition 12. II 
that 

k ,k' < Vk>0k,i < Vkk'6 ljX = Vkk'M. | 
Now we are able to show the following result. 

Theorem 4.3 Suppose (3 6 W is a k-sparse vector and y = Ff3 + z with z satisfying 
\\z\\ 2 < e. Let kM = t. If t < (or, equivalently , k < ^J^ j, then for any r] > e, 

the minimizer (3 to the problem 

min ll'yll i subject to \\y — F7H2 < rj 

obeys 



\\P-Ph<C(ri + e). (4.S 



15 



Proof of Theorem 14.31 . It follows from Proposition 14. II that 

Si.sk + 0k,i.5k < (1-5/c + VU)k - 1)M = (1.5 + Vu>)t - M. 
Since t < |qrjS| , the condition 5%,5k + @k,i.5k < 1 holds. By Theorem 14.21 

J- — 0l.5fe — ffe,1.5fe 

< vgg + ( L5fc - !) M ) ( „ i c) 

1 + M - (1.5 + \/L5)t 

\/2(2 + 3t-2M) , 

;fa + e). | 



2 + 2M - (3 + y/6)t 

Remarks. In this theorem, the result of Donoho, Elad and Temlyakov [8] is improved in 
the following ways. 

1. The sparsity k is relaxed from k < ^ to k < ^+™ M « lA7±gf. So roughly 
speaking, Theorem 14.31 improves the result in Donoho, Elad and Temlyakov [8] by 
enlarging the support of (3 by 47%. 

2. It is clear that larger t is preferred. Since M is usually very small, the bound C is 
tightened from C = ^ M _ u to C = ^|gt|gM_ ) as t is c i ose to J. 

5 Recovery of Sparse Signals in Gaussian Noise 

We now turn to the case where the noise is Gaussian. Suppose we observe 

y = F(3 + z, z~N(Q,a 2 I n ) (5.1) 

and wish to recover (3 from y and F . We assume that a is known and that the columns 
of F are standardized to have unit i<i norm. This is a case of significant interesting, 
in particular in statistics. Many methods, including the Lasso (Tibshirani [13]), LARS 
(Efron, Hastie, Johnstone and Tibshirani [10]) and Dantzig selector (Candes and Tao [5]), 
have been introduced and studied. 

The following results show that, with large probability, the Gaussian noise z belongs to 
bounded sets. 

Lemma 1 The Gaussian error z ~ N(0, o~ 2 I n ) satisfies 

P (ll^lloo < <V21ogp) > 1 - - J. (5.2) 

V / 2v7rlogp 
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and 

\z\\ 2 < cr\J n + 2\/n\ogn) > 1 — — . (5.3) 



n 



Inequality (15.21) follows from standard probability calculations and inequality (15. 3p is proved 
in the Appendix. 

Lemma [1] suggests that one can apply the results obtained in the previous section for the 
bounded error case to solve the Gaussian noise problem. Candes and Tao [5] introduced the 
Dantzig selector for sparse recovery in the Gaussian noise setting. Given the observations 
in (15.11) . the Dantzig selector f3 DS is the minimizer of 

(DS) mm||7||i subject to \\F T (y - F7) < \ (5.4) 



where A p = ay/2 log p. 

In the classical linear regression problem when p < n the least squares estimator is the 
solution to the normal equation 

F T y = F T F(3. (5.5) 

The constraint \\F T (y — F/3)||oo < \ p in the convex program (DS) can thus be viewed as a 
relaxation of the normal equation (15.51) . And similar to the noiseless case i\ minimization 
leads to the "sparsest" solution over the space of all feasible solutions. 
Candes and Tao [5 J showed the following result. 

Theorem 5.1 (Candes and Tao [5J) Suppose (5 E MP is a k-sparse vector obeying 

0~2k + @k,2k < 1- 



Choose X p = ay/2 \ogp in ( fi.51) . Then with large probability, the Dantzig selector f3 obeys 

\\f3-(3\\ 2 <C 1 aVky/2\ogp, (5.6) 



with C\ 



1 — 5 fc — k ok 



0. 



Another commonly used method in statistics is the Lasso which solves the l\ regular- 
ized least squares problem (11.41) . This is equivalent to the ^-constrained l\ minimization 
problem (Pi). In the Gaussian error case, we shall consider a particular setting. Let /3 £2 
be the minimizer of 

min II7II1 subject to \\y — F7IU < e n (5.7) 



1 It appears that the constant C\ in Candes and Tao [5] should be C\ =4/(1— 8 2 k — ^fc,2fc) 
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where e n = a a/ n + %/n logn. 

Combining our results from the last section together with Lemma [U we have the follow- 
ing results on the Dantzig selector /3 D and the estimator f3 e2 obtained from i\ minimization 
under the £2 constraint. Again, these results improve the previous results in the literature 
by weakening the conditions and providing more precise bounds. 

Theorem 5.2 Suppose /3 E M? is a k-sparse vector and the matrix F satisfies 

Sl.Bk + &k,l.5k < 1- 

Then with probability P > 1 — ^— ==, the Dantzig selector (3 DS obeys 

Wf3 DS -Ph< C.aVk^hip', (5.8) 
with C\ = 1 _ (5i l^0 k - rfc 1 an d with probability at least 1 — ^, f3 l2 obeys 



¥2 



2 



< D t aJn + 2a/™ logn (5.9) 



Remark: Similar to the results obtained in the previous sections, if (3 is not necessarily 
/c-sparse, in general we have, with probability P > 1 — 2v / 7r 1 logp 7 

||/3 D5 -/3|| 2 < CKTV^v^loiP + Ca^ll^-i^wlli- 
where Ci = , x 2v/ ^, and C 2 = 1 2 ^ 1 ~t 1,5fc - > , and with probability P > 1 — -, 



< Dio- a/ n + 2a/?2 logn + D 2 A;~3 



max(fc) || 1 



where Dl = ^i±M and D = gvs^a^ja) 



6 Appendix 

Proof of Proposition 12.31 Let 

A = ((aiH h Ok,) + 2(0^+1 H \- a 2w ) + (a 2w+ i -\ ha 3a) ))' 

= Ai + A 2 + A 3 + A 4 + A 5 + A 6 . 
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Where each Aj is given (and bounded) by 



, 2 



Ax = (ai + a 2 + h a w )' 

> a 2 1 + 3a 2 2 + --- + (2w- l)a 2 w 
A 2 = 4(a w+1 + a w+2 ^ V a 2w f 

> A(a 2 w+1 + 3a 2 w+2 + --- + (2w-l)a 2 2w ) 

A3 — { a 2w+l + a 2w+2 + • • • + Q>3w) 

> a L+i + 3aL+2 + • • • + (2w - l)a| u , 

A 4 = 4(ai + a 2 H h a«,) (a TO+ i + a w+2 H h a 2w ) 

> 4w((4 +1 + <4 +2 H ha^) 

A 5 = 2(ai + a 2 H h a^) (a 2w +i + a 2w+2 H h a 3w ) 

> 2w ( a L+i + a L+2 + ■■■ + oL) 

A 6 = A(a w+ i + a,i, +2 + • • • + a 2w ) [a 2w +i + a 2w+2 + • • • + 03^) 

> 4w (°L+i + a L+ 2 + • • • + <4w) ■ 

Without loss of generality, we assume that w is even. Write 

A 2 = A 2 i + A 22 , 

where 

A21 = 4(a^ +1 + 3a 2 w+2 + ■ ■ ■ + (w - l)a* + ™ + + to^ + » +2 + • • • + waL) , 



and 



A 22 = A(a 2 w+ ^ +1 + 3a 2 w+ ™ +2 h (w - l)a 2 2w ) > w 2 a 2 2w 

= (2w - l)aL + ( 2w - 3 )«L + • • • + 3aL + • " • + «L- 

Now 

A 3 + A 5 + A 6 + A22 > 6{w + l)a 2 2w+1 + {6w + 3)a 2 2w+2 + ■ ■ ■ + {8w - l)a 2 3w 

+ {2w - l)a 2 2w + (2w - 3)a 2 2w H h 3a 2u) H ha^ 

> 6(w + l)aL+i + (6w + 3)a^ +2 + • • • + (8w - l)a^ 
+ (2w - l)a 2 2w+1 + (2w - 3)a 2 2w+2 + ■■■ + 3a^_ x + 

> 8w (a 2 2w+1 + a 2 2w+3 + ■■■ + a\ w _ x + a 2 3w ) 
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and 



Ai + A 2 i + A 4 > aj + 3a 2 + ■ ■ ■ + (2w - l)a 2 w 

+A(a 2 w+1 + 3a 2 w+2 + ■ ■ ■ + (w - l)a w+ m 
+wa 2 w+2 » +1 + wa 2 w+ ™ +2 + ■■■ + wa 2 2w ) 
+Aw (a 2 w+1 + a 2 w+2 + ■■■ + a 2 2w ) 
> w 2 a 2 w + A(w + l)a 2 w+1 + A(w + 3)a 2 w+2 + ■■■ A(2w - l)i 

+8wa 2 w+ ™ +1 + 8wa 2 w+2 » +2 H h 8wa 2 2w 

f terms 



> A(w - l)a 2 w + A(w - 3)a 2 w + ■ ■ ■ + Aa 2 w 

+A(w + l)a 2 w+1 + A(w + 3)a 2 w+2 + ■■■ A(2w - l)a 2 w+1 
+8wa 2 w+§+1 + 8wa 2 w+§+2 + ■■■ + 8wa 2 2w 

> 8w {a 2 w+l + a 2 w +3 + • • • + a\ w ^ + a 2 2w ) . 



Therefore 



A > 8w(a 2 w+1 + a 2 w+3 + ■■■ + +a 2 2w + a 2 2w+1 + ■■■ + +a 2 3w ), 
and the inequality is proved. | 

Proof of Lemma [H The first inequality is standard. We now prove inequality ( 15. 3ft . 
Note that X = \\z\\ 2 /a 2 is a \ 2 n random variable. It follows from Lemma 4 in Cai \1\ that 
for any A > 

P(X > (1 + A)n) < j-j= exp{--(A - log(l + A))}. 

Hence, 

P [\\zh < (rJn + 2^/n^) = 1-P(X > (1+A)n) > 1-t-^= exp{-^(A-log(l+A))} 



where A = 2^Jn 1 \ogn. It now follows from the fact log(l + A) < A — \\ 2 + |A 3 that 

„fn ,, / / , \ 1 1 r 4(logn) 3/2 , 

P \\z 2 < a^n + 2^n\ogn > 1 exp{ — —= — }. 

Inequality (15.31) now follows by verifying directly that 2 ^ 1 io g n ex P( 4 ^ / ) — ^ ^ or a ^ 
n > 2. I 
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